A practical guide to model selection

نویسنده

  • Isabelle Guyon
چکیده

Slides accompanying this material are available at http://clopinet.com/isabelle/ Projects/MLSS08/. We focus in the chapter on the problem of model selection. The rest of the topics covered in the class are developed in tutorials on feature selection (Guyon and Elisseeff, 2003; Guyon et al., 2006a) and causality (Guyon et al., 2007a; Guyon, 2008; Guyon et al., 2009a). Nowadays, there are many machine learning or data mining packages providing highly optimized implementations of leading algorithms, including commercial platforms like SAS and SPSS and freeware packages like Weka1, R2, Lush3 and several Matlab libraries like the Spider4 and CLOP5. The MLOSS open source repository6 indexes such valuable resources. In this context, the task of practitioners has shifted from that of identifying and implementing algorithms to that of learning how to use such packages and selecting the best model. The problem of model selection still remains largely the user’s responsibility. Best practices for model selection have emerged over the years, grounded in a wide variety of theories (regularization, Bayesian priors, MDL, structural risk minimization, bias/variance tradeoff, etc.; see Appendix A). Interestingly, all those theories converge towards the same principle stated already in the 14th century by William of Ockham: “Pluralitas non est ponenda sine neccesitate”, which prescribes limiting model complexity to the minimum necessary to explain the data, or shave off unnecessary parameters (Ockham’s razor). Indeed, as illustrated in Figure 1, attempts to improve prediction performance on training data by increasing model complexity may lead to data “over-fitting”: the good

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Practical Guide to Differential Diagnosis of Small B Cell Lymphomas

Correct diagnosis and classification of lymphoid neoplasms depends on the integration of morphologic, immunophenotypic and molecular genetic features. The mature small B cell lymphomas despite their overlapping histomorphologies, have different clinical behavior and treatment. In this review, differentia...

متن کامل

Task Scheduling Using Particle Swarm Optimization Algorithm with a Selection Guide and a Measure of Uniformity for Computational Grids

In this paper, we proposed an algorithm for solving the problem of task scheduling using particle swarm optimization algorithm, with changes in the Selection and removing the guide and also using the technique to get away from the bad, to move away from local extreme and diversity. Scheduling algorithms play an important role in grid computing, parallel tasks Scheduling and sending them to ...

متن کامل

Task Scheduling Using Particle Swarm Optimization Algorithm with a Selection Guide and a Measure of Uniformity for Computational Grids

In this paper, we proposed an algorithm for solving the problem of task scheduling using particle swarm optimization algorithm, with changes in the Selection and removing the guide and also using the technique to get away from the bad, to move away from local extreme and diversity. Scheduling algorithms play an important role in grid computing, parallel tasks Scheduling and sending them to ...

متن کامل

Modelling Integrated Multi-item Supplier Selection with Shipping Frequencies

There are many benefits for coordination of multiple suppliers when single supplier cannot satisfy buyer demands.  In addition, buyer needs to purchase multiple items in a real supply chain. So, a model that satisfies these requests has many advantages. We extend the existing approaches in the literature that assume all suppliers need to be put on a common replenishment cycle and each supplier ...

متن کامل

Reliability and acceptability of the multiple mini-interview for selection of residents in cardiology

Introduction: The multiple mini-interview (MMI) model can beuseful to evaluate non-cognitive domains and guide the selectionprocess in medical residency programs. The aim of this study wasto evaluate the reliability and acceptability of the MMI model forthe selection of residents in a cardiology residency program.Methods: We conducted an observational and prospecti...

متن کامل

Medial Collateral Ligament Injury; A New Classification Based on MRI and Clinical Findings. A Guide for Patient Selection and Early Surgical Intervention

Medial collateral ligament (MCL)injury, is one of the most common ligament injuries of the knee,mostly results from a valgus force.Restoration of function and going back to the pre-injury level of function is the aim of treatment in ligament injuries of the knee. There are multiple soft tissue structures in medial side that play an important role in connection with each other to retain medial s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009